Subset Selection Algorithms: Randomized vs. Deterministic

نویسندگان

  • MARY E. BROADBENT
  • MARTIN BROWN
  • KEVIN PENNER
  • Ilse Ipsen
  • Rizwana Rehman
چکیده

Abstract. Subset selection is a method for selecting a subset of columns from a real matrix, so that the subset represents the entire matrix well and is far from being rank deficient. We begin by extending a deterministic subset selection algorithm to matrices that have more columns than rows. Then we investigate a two-stage subset selection algorithm that utilizes a randomized stage to pick a smaller number of candidate columns, which are forwarded for to the deterministic stage for subset selection. We perform extensive numerical experiments to compare the accuracy of this algorithm with the best known deterministic algorithm. We also introduce an iterative algorithm that systematically determines the number of candidate columns picked in the randomized stage, and we provide a recommendation for a specific value. Motivated by our experimental results, we propose a new two stage deterministic algorithm for subset selection. In our numerical experiments, this new algorithm appears to be as accurate as the best deterministic algorithm, but it is faster, and it is also easier to implement than the randomized algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ridge Regression and Provable Deterministic Ridge Leverage Score Sampling

Ridge leverage scores provide a balance between low-rank approximation and regularization, and are ubiquitous in randomized linear algebra and machine learning. Deterministic algorithms are also of interest in the moderately big data regime, because deterministic algorithms provide interpretability to the practitioner by having no failure probability and always returning the same results. We pr...

متن کامل

Faster Subset Selection for Matrices and Applications

We study the following problem of subset selection for matrices: given a matrix X ∈ Rn×m (m > n) and a sampling parameter k (n ≤ k ≤ m), select a subset of k columns from X such that the pseudoinverse of the sampled matrix has as small a norm as possible. In this work, we focus on the Frobenius and the spectral matrix norms. We describe several novel (deterministic and randomized) approximation...

متن کامل

Practical Algorithms for Selection on Coarse-Grained Parallel Computers

In this paper, we consider the problem of selection on coarse-grained distributed memory parallel computers. We discuss several deterministic and randomized algorithms for parallel selection. Experimental results on the CM5 demonstrate that randomized algorithms are superior to their deterministic counterparts.

متن کامل

.1 Error

Randomized algorithms have an additional primitive operation that deterministic algorithms do not have. We can select a number from a range [1 . . .x] uniformly at random, at a cost assumed to be linearly dependent on the size of x in binary representation. The algorithm then makes a decision based on the outcome of this random selection. We first look at some defining characteristics of random...

متن کامل

Tree Pattern Matching to Subset Matching in Linear Time

In this paper, we show an O(n + m) time Turing reduction from the tree pattern matching problem to another problem called the subset matching problem. Subsequent works have given efficient deterministic and randomized algorithms for the subset matching problem. Together, these works yield an O ( n log m+m ) time deterministic algorithm and an O(n logn + m) time Monte Carlo algorithm for the tre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010